Robust Nonparametric Data Approximation of Point Sets via Data Reduction
نویسندگان
چکیده
In this paper we present a novel non-parametric method of simplifying piecewise linear curves and we apply this method as a statistical approximation of structure within sequential data in the plane. We consider the problem of minimizing the average length of sequences of consecutive input points that lie on any one side of the simplified curve. Specifically, given a sequence P of n points in the plane that determine a simple polygonal chain consisting of n−1 segments, we describe algorithms for selecting an ordered subset Q ⊂ P (including the first and last points of P ) that determines a second polygonal chain to approximate P , such that the number of crossings between the two polygonal chains is maximized, and the cardinality of Q is minimized among all such maximizing subsets of P . Our algorithms have respective running times O(n logn) when P is monotonic and O(n log n) when P is an arbitrary simple polyline. Finally, we examine the application of our algorithms iteratively in a bootstrapping technique to define a smooth robust non-parametric approximation of the original sequence.
منابع مشابه
A New Approach for Determination of Neck-Pore Size Distribution of Porous Membranes via Bubble Point Data
Reliable estimation of the porous membranes neck-pore size distribution (NPSD) is the key element in the design and operation of all membrane separation processes. In this paper, a new approach is presented for reliable of NPSD of porous membranes using wet flow-state bubble point test data. For this purpose, a robust method based on the linear regularization theory is developed to extract NPSD...
متن کاملAsymptotic Behaviors of the Lorenz Curve for Left Truncated and Dependent Data
The purpose of this paper is to provide some asymptotic results for nonparametric estimator of the Lorenz curve and Lorenz process for the case in which data are assumed to be strong mixing subject to random left truncation. First, we show that nonparametric estimator of the Lorenz curve is uniformly strongly consistent for the associated Lorenz curve. Also, a strong Gaussian approximation for ...
متن کاملRobust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملLeveraging the Power of Big Data for Robust Process Operations under Uncertainty
We propose a data-driven outlier-insensitive adaptive robust optimization framework that leverages big data in industries. A Bayesian nonparametric model – the Dirichlet process mixture model – is adopted to extract the information embedded within uncertainty data via a variational inference algorithm. We then devise data-driven uncertainty sets for adaptive robust optimization. This Bayesian n...
متن کاملStochastic Gradient Descent Methods for Estimation with Large Data Sets
We develop methods for parameter estimation in settings with large-scale data sets, where traditional methods are no longer tenable. Our methods rely on stochastic approximations, which are computationally efficient as they maintain one iterate as a parameter estimate, and successively update that iterate based on a single data point. When the update is based on a noisy gradient, the stochastic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012